{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数据结构总览\n",
    "前面几章已经将我们使用最频繁的三种数据结构做了介绍，本章进行总结一下，之后在其基础上，再介绍一下基本数据类型和很有用的中间类型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "E:\\ML\\实战\\pandas实用教程\n"
     ]
    }
   ],
   "source": [
    "!cd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. 三种常用数据结构"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.1 Series\n",
    "Series是由数据类型相同的元素构成的一维数据结构，具有列表和字典的特性。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 四个重要属性\n",
    "- Series.index\n",
    "- Series.name\n",
    "- Series.values\n",
    "- Series.dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.2 DataFrame\n",
    "DataFrame是由索引相同的Series构成的的一二维数据结构。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 四个重要属性\n",
    "- DataFrame.index\n",
    "- DataFrame.columns\n",
    "- DataFrame.values\n",
    "- DataFrame.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.3 Index\n",
    "Index是构成和操作Series、DataFrame的关键，其具有元组特性。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 三个重要属性\n",
    "- Index.name\n",
    "- Index.values\n",
    "- Index.dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. 基本数据类型\n",
    "- 这些数据类型实际上都是numpy带来的;\n",
    "- 基本数据类型中不包括字符串类型，字符串都是存储为object_型；\n",
    "- 所以使用这些类型时，要加上前缀 `np.` 。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1 布尔型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "columns = [u'类别',u'说明 ',u'简称']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>类别</th>\n",
       "      <th>说明</th>\n",
       "      <th>简称</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>bool_</td>\n",
       "      <td>compatible: Python bool</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>bool8</td>\n",
       "      <td>8 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      类别                      说明  简称\n",
       "0  bool_  compatible: Python bool  ?\n",
       "1  bool8                   8 bits   "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [ ['bool_','compatible: Python bool','?'],\n",
    "         ['bool8','8 bits',''] ] \n",
    "pd.DataFrame(data, columns = columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 整型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.1 有符号整型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>类别</th>\n",
       "      <th>说明</th>\n",
       "      <th>简称</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>byte</td>\n",
       "      <td>compatible: C char</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>short</td>\n",
       "      <td>compatible: C short</td>\n",
       "      <td>h</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>intc</td>\n",
       "      <td>compatible: C int</td>\n",
       "      <td>i</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>int_</td>\n",
       "      <td>compatible: Python int</td>\n",
       "      <td>l</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>longlong</td>\n",
       "      <td>compatible: C long long</td>\n",
       "      <td>q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>intp</td>\n",
       "      <td>large enough to fit a pointer</td>\n",
       "      <td>p</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>int8</td>\n",
       "      <td>8 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>int16</td>\n",
       "      <td>16 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>int32</td>\n",
       "      <td>32 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>int64</td>\n",
       "      <td>64 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         类别                            说明  简称\n",
       "0      byte             compatible: C char  b\n",
       "1     short            compatible: C short  h\n",
       "2      intc              compatible: C int  i\n",
       "3      int_         compatible: Python int  l\n",
       "4  longlong        compatible: C long long  q\n",
       "5      intp  large enough to fit a pointer  p\n",
       "6      int8                         8 bits   \n",
       "7     int16                        16 bits   \n",
       "8     int32                        32 bits   \n",
       "9     int64                        64 bits   "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [['byte','compatible: C char','b'],\n",
    "['short','compatible: C short','h'],\n",
    "['intc','compatible: C int','i'],\n",
    "['int_','compatible: Python int','l'],\n",
    "['longlong','compatible: C long long','q'],\n",
    "['intp','large enough to fit a pointer','p'],\n",
    "['int8','8 bits','' ],\n",
    "['int16','16 bits','' ],\n",
    "['int32','32 bits',''],\n",
    "['int64','64 bits','']]\n",
    "pd.DataFrame(data = data, columns = columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.2 无符号整型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>类别</th>\n",
       "      <th>说明</th>\n",
       "      <th>简称</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>ubyte</td>\n",
       "      <td>compatible: C unsigned char</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ushort</td>\n",
       "      <td>compatible: C unsigned short</td>\n",
       "      <td>H</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>uintc</td>\n",
       "      <td>compatible: C unsigned int</td>\n",
       "      <td>I</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>uint</td>\n",
       "      <td>compatible: Python int</td>\n",
       "      <td>L</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ulonglong</td>\n",
       "      <td>compatible: C long long</td>\n",
       "      <td>Q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>uintp</td>\n",
       "      <td>large enough to fit a pointer</td>\n",
       "      <td>P</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>uint8</td>\n",
       "      <td>8 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>uint16</td>\n",
       "      <td>16 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>uint32</td>\n",
       "      <td>32 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>uint64</td>\n",
       "      <td>64 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          类别                            说明  简称\n",
       "0      ubyte    compatible: C unsigned char  B\n",
       "1     ushort   compatible: C unsigned short  H\n",
       "2      uintc     compatible: C unsigned int  I\n",
       "3       uint         compatible: Python int  L\n",
       "4  ulonglong        compatible: C long long  Q\n",
       "5      uintp  large enough to fit a pointer  P\n",
       "6      uint8                         8 bits   \n",
       "7     uint16                        16 bits   \n",
       "8     uint32                        32 bits   \n",
       "9     uint64                        64 bits   "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [['ubyte','compatible: C unsigned char','B'],\n",
    "['ushort','compatible: C unsigned short','H'],\n",
    "['uintc','compatible: C unsigned int','I'],\n",
    "['uint','compatible: Python int','L'],\n",
    "['ulonglong','compatible: C long long','Q'],\n",
    "['uintp','large enough to fit a pointer','P'],\n",
    "['uint8','8 bits',''], \n",
    "['uint16','16 bits',''],\n",
    "['uint32','32 bits',''],\n",
    "['uint64','64 bits','']]\n",
    "pd.DataFrame(data = data, columns = columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.3 浮点型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>类别</th>\n",
       "      <th>说明</th>\n",
       "      <th>简称</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>half</td>\n",
       "      <td></td>\n",
       "      <td>e</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>single</td>\n",
       "      <td>compatible: C float</td>\n",
       "      <td>f</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>double</td>\n",
       "      <td>compatible: C double</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>float_</td>\n",
       "      <td>compatible: Python float</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>longfloat</td>\n",
       "      <td>compatible: C long float</td>\n",
       "      <td>g</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>float16</td>\n",
       "      <td>16 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>float32</td>\n",
       "      <td>32 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>float64</td>\n",
       "      <td>64 bits</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>float96</td>\n",
       "      <td>96 bits, platform?</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>float128</td>\n",
       "      <td>128 bits, platform?</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          类别                       说明  简称\n",
       "0       half                            e\n",
       "1     single       compatible: C float  f\n",
       "2     double      compatible: C double   \n",
       "3     float_  compatible: Python float  d\n",
       "4  longfloat  compatible: C long float  g\n",
       "5    float16                   16 bits   \n",
       "6    float32                   32 bits   \n",
       "7    float64                   64 bits   \n",
       "8    float96        96 bits, platform?   \n",
       "9   float128       128 bits, platform?   "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [['half',' ','e'],\n",
    "['single','compatible: C float','f'],\n",
    "['double','compatible: C double',''],\n",
    "['float_','compatible: Python float','d'],\n",
    "['longfloat','compatible: C long float','g'],\n",
    "['float16','16 bits',''],\n",
    "['float32','32 bits',''],\n",
    "['float64','64 bits',''], \n",
    "['float96','96 bits, platform?',''], \n",
    "['float128','128 bits, platform?','']]\n",
    "pd.DataFrame(data = data, columns = columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.4 复数型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>类别</th>\n",
       "      <th>说明</th>\n",
       "      <th>简称</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>csingle</td>\n",
       "      <td></td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>complex_</td>\n",
       "      <td>compatible: Python complex</td>\n",
       "      <td>D</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>clongfloat</td>\n",
       "      <td></td>\n",
       "      <td>G</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>complex64</td>\n",
       "      <td>two 32-bit floats</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>complex128</td>\n",
       "      <td>two 64-bit floats</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>complex192</td>\n",
       "      <td>two 96-bit floats, platform?</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>complex256</td>\n",
       "      <td>two 128-bit floats, platform?</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           类别                            说明  简称\n",
       "0     csingle                                 F\n",
       "1    complex_     compatible: Python complex  D\n",
       "2  clongfloat                                 G\n",
       "3   complex64              two 32-bit floats   \n",
       "4  complex128              two 64-bit floats   \n",
       "5  complex192   two 96-bit floats, platform?   \n",
       "6  complex256  two 128-bit floats, platform?   "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [['csingle',' ','F'],\n",
    "['complex_','compatible: Python complex','D'],\n",
    "['clongfloat',' ','G'],\n",
    "['complex64','two 32-bit floats',''], \n",
    "['complex128','two 64-bit floats',''], \n",
    "['complex192','two 96-bit floats, platform?',''], \n",
    "['complex256','two 128-bit floats, platform?','']]\n",
    "pd.DataFrame(data = data, columns = columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.5 任意类型\n",
    "Object其实就是就是指向pyton的类类型object的一个引用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>类别</th>\n",
       "      <th>说明</th>\n",
       "      <th>简称</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>object_</td>\n",
       "      <td>any Python object</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        类别                说明  简称\n",
       "0  object_  any Python object  O"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [['object_','any Python object','O']]\n",
    "pd.DataFrame(data = data, columns = columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. 有用的中间类型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.1 `.str`\n",
    "这个中间类型可将object_类型的Series当做字符串来处理，有很多可用的字符串处理函数。在后面的章节会专门讲这个应用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    a_b\n",
       "1    b_c\n",
       "2    c_d\n",
       "dtype: object"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(['a_b','b_c','c_d'],dtype = 'object')\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>b</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>c</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0  1\n",
       "0  a  b\n",
       "1  b  c\n",
       "2  c  d"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.str.split('_',expand = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.2 `.cat`\n",
    "这个中间类型专门处理类别类型，类别类型是机器学习中经常面对的一种特征属性，后面章节会讲到。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "dtype: category\n",
       "Categories (3, int64): [1, 2, 3]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series( [1,2,3], dtype = 'category')\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Int64Index([1, 2, 3], dtype='int64')"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.cat.categories"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.3 `.dt`\n",
    "这个中间类型专门处理时间格式的Series，在时间序列分析中会用到。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   2017-08-01\n",
       "1   2017-08-03\n",
       "2   2017-08-03\n",
       "dtype: datetime64[ns]"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(['2017-08-01','2017-08-03','2017-08-03'], dtype = 'datetime64[ns]')\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2017\n",
       "1    2017\n",
       "2    2017\n",
       "dtype: int64"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.dt.year"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2rc2"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "navigate_num": "#000000",
    "navigate_text": "#333333",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700",
    "sidebar_border": "#EEEEEE",
    "wrapper_background": "#FFFFFF"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "318px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": false,
   "sideBar": true,
   "threshold": "3",
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": true,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
